Determination of methylation of nucleic acid sequences

ABSTRACT

The invention relates to a method of detecting the precise locations of methyl-cytosines in a given nucleic acid sequence. In particular, the invention features a method which includes sequencing a template nucleic acid that is attached to a hairpin nucleic acid or double-stranded nucleic acid anchor, which contain specifically-designed sites for nicking or other endonucleases. The template nucleic acid is then regenerated to single-stranded form via methods described herein, and then treated to convert either the methylated cytosines, or non-methylated cytosines, and the template nucleic acid is then re-sequenced The results of the first and second sequencing reactions are then compared.

BACKGROUND

In many eukaryotes, between 10 and 30% of cytosine bases are modified bythe enzymatic addition of a methyl group. Although this modificationdoes not interfere with the fidelity of DNA replication processes, itenables modulation of diverse cellular processes through proteininteractions with hypo- or hyper-methylated sequences. These methylatedsequences are not randomly dispersed throughout a genome, but instead,are almost exclusively found in repetitive CpG sequences in theregulatory regions upstream of many genes. Methylation of thesesequences is associated with repression of gene activity and can resultin global changes to gene expression. For example, methylation plays acentral role in the inactivation of one of the two X chromosomes infemale cells, which is a prerequisite for ensuring that females do notproduce twice the level of X-linked gene products as would males.Methylation also underlies the selective repression of either thematernally or paternally inherited copy of pairs of alleles in a processknown as genetic imprinting. It also silences transposable elementswhose expression would otherwise be deleterious to a genome.

Patterns of methylation in a genome are heritable because of thesemi-conservative nature of DNA replication. During this process, thedaughter strand, newly replicated on a methylated template strand, isnot initially methylated, but the template strand directsmethyltransferase enzymes to fully methylate both strands. Thus,methylation patterns carry an extra level of genetic information downthrough the generations in addition to that information inherited in theprimary sequence of the four nucleotides.

Aberrant patterns of genomic methylation also correlate with diseasestates and are among the earliest and most common alteration found inhuman malignancies. Moreover, mistakes made during the establishment ofmethylation patterns during development underlie several specificinherited disorders. Consequently, there is a demand for high throughputapproaches for profiling the methylation status of many genes inparallel both for research purposes and for clinical applications.

Many methods already exist for detecting the methylation of DNA and theycan be broadly classified depending on the level of sequence-specificinformation they produce. On the simplest level, there are techniquesthat only yield information on overall levels of methylation within agenome. For example, methylated sequences can be separated fromunmethylated sequences on reverse-phase HPLC due to the difference inhydrophobicity of DNase I treated DNA. Such methods are simple but donot give any information on the sequence context of the methylationsites. Alternatively, pairs of restriction endonucleases that recognizethe same sequence but have different sensitivities to cytosinemethylation at that sequence can be used. Methylation at this sequencewill render it refractory to cleavage by one enzyme, but sensitive tothe other. If no cytosine bases are methylated in a sequence, bothenzymes will produce identically sized restriction fragments. Incontrast, if methylation is present, the enzymes will produce differentsizes of fragments that can be distinguished by standard analyticaltechniques such as electrophoresis through agarose. If Southern blotanalysis is subsequently performed and the bands probed with a labelledfragment from a gene of interest, then information on the sequencecontext of the methylation site can be investigated. These methods arelimited because they are dependent on the availability of usefulrestriction enzymes and are confined to the study of methylationpatterns among sequences that contain those restriction sites.

Methods that do not rely on sequence context but which can detectmethylation at any chosen sequence are mainly based on the sodiumbisulfite reaction. Under controlled conditions, this reagent convertscytosine to uracil while methyl-cytosine remains unmodified. If thetreated DNA is then sequenced, the detection of a cytosine indicatesthat the cytosine is methylated because it would have been otherwiseconverted to a uracil.

Standard Sanger sequencing procedures have the disadvantage that only alimited number of sequencing reactions can be performed at the sametime. Moreover, PCR amplification and sub-cloning may be necessary toproduce sufficient quantities of DNA for sequencing, and both methodscan introduce artifacts into the sequence, including changes inmethylation.

Microarrays are molecular probes such as nucleic acid molecules arrangedsystematically onto a solid, generally flat surface. Each probe sitecarries a reagent such as a single stranded nucleic acid, whosemolecular recognition of a complementary nucleic acid molecule leads toa detectable signal, often based on fluorescence. Microarrays carryingmany thousands of probe sites can be used to monitor gene expressionprofiles over a large number of genes in a single experiment on ahybridisation based format.

The nucleic acid probes on the microarrays are generally made in twoways. A combination of photochemistry and DNA synthesis allowsbase-by-base synthesis of the probes in situ. This is the approachpioneered by Affymetrix for growing short strands of around 25 bases.Their ‘genechips’ are commercially available and widely used (e.g.,Wodlicka et al., 1997, Nature Biotechnology 15:1359-1367), despite theexpense of making arrays designed for a particular experiment. Anothermethod for preparing microarrays is to use a robot to spot small (nL)volumes of nucleic acid sequences onto discreet areas of the surface.Microarrays prepared in this manner have less dense features thanAffymetrix arrays but are more universal and cheaper to prepare (e.g.,Schena et al., 1995, Science 270:467-470). The main drawback of alltypes of standard microarrays is the complex hardware required toachieve a spatial distribution of multiple copies of the same DNAsequence. Such limitations are overcome by single molecule arraytechnology, e.g., as described in International Patent App. WO 00/06770.

In addition to hybridisation-based detection a number of otherbiochemical assays have been applied to nucleic acid microarrays,particularly in the area of genotyping. A common assay is to use a DNApolymerase or DNA ligase to incorporate a fluorescent marker onto thearray. The enzyme incorporation allows the identity of one or more basesto be determined based on the identity of the labelled marker. Suchextension assays have been developed by a number of companies andacademic groups for typing single nucleotide polymorphisms (“SNPs”). Theability to perform multiple cycles of extension reactions on theseplatforms would be advantageous as it gives more information about thenature of the sample under investigation. For example, performingmultiple extensions complementary to a template strand yieldsinformation on the sequence of the template strand. During such a‘sequencing by synthesis’ reaction, a new strand, base-paired to thetemplate nucleic acid, is built up in the 5′ to 3′ direction byincorporation of individual nucleotides complementary to thosenucleotides in the template starting at its 3′ end. The end result of aseries of such incorporations is that the single-stranded templatenucleic acid is no longer single-stranded; instead, it is base-paired toa synthetic complementary strand. The result is a double-strandednucleic acid molecule: the original template nucleic acid and itscomplementary strand, attached to the solid substrate.

Once such a sequencing reaction is complete, removal of the syntheticstrand complementary to the template would permit re-use of the templatenucleic acid, e.g., in another sequencing reaction to verify the resultsof the first reaction. In another application, the sequenced strandbecomes available for hybridization of nucleic acid, e.g., DNA or DNAmimics, e.g., PNA.

In contrast, the complete removal of both the template strand and itssynthetic complement would allow new template nucleic acids to beattached to the solid substrate to form a new array.

SUMMARY OF THE INVENTION

The invention relates to a method of detecting the precise locations ofmethyl-cytosines in a given nucleic acid sequence. In particular, theinvention features a method which includes sequencing a template nucleicacid that is attached to a hairpin nucleic acid or double-strandednucleic acid anchor. The template nucleic acid is then regenerated tosingle-stranded form via methods described herein, and then treated withsodium bisulfite, which converts the cytosines in the template nucleicacid to uracils unless the cytosines are methylated, in which case theyremain as cytosines. The template nucleic acid is then re-sequenced. Theresults of the first and second sequencing reactions are then compared.The presence of a cytosine in the first sequence and a uracil in thecorresponding location in the second sequence indicates that thecytosine at that location is unmethylated. However, the presence of acytosine at a particular location in both the first second sequenceindicates that the cytosine at that location is a methyl-cytosine.

The invention makes use of a hairpin nucleic acid, or a double-strandednucleic acid anchor, which allows templates to be regenerated accordingto the invention. In particular, the hairpin nucleic acid ordouble-stranded nucleic acid anchor contains a restriction site,preferably for a nicking endonuclease, located before or at the 3′ endof the hairpin nucleic acid. The hairpin nucleic acid or double-strandednucleic acid anchor allow the regeneration of a single-stranded nucleicacid template following its conversion to a double-stranded product,e.g., as a result of a sequencing reaction.

The invention features a method for detecting a methylated cytosine in atemplate nucleic acid, the method including: (a) providing ahairpin-template complex, including: (i) a hairpin nucleic acid, wherethe hairpin nucleic acid is self-complementary and has a firstrestriction site for a nicking endonuclease, the restriction siteincluding a recognition sequence and a cleavage site, where therecognition sequence is situated so that the cleavage site is before,at, or beyond the 3′ end of the hairpin nucleic acid, and where thehairpin nucleic acid is a self-hybrid; and (ii) a single-strandedtemplate nucleic acid; where 5′ end of the hairpin nucleic acid isattached to the 3′ end of the single-stranded template nucleic acid; (b)sequencing the single-stranded template nucleic acid of thehairpin-template complex, thereby producing: (ii) a first sequence; and(i) a hairpin-template-complement complex, including thehairpin-template complex of (a), and further including a syntheticnucleic acid strand complementary to the template nucleic acid, wherethe synthetic nucleic acid strand is hybridized to the template nucleicacid, and where the complementary nucleic acid strand is attached at its5′ end to the 3′ end of the hairpin nucleic acid; (c) removing thecomplementary nucleic acid strand from the hairpin-template-complementcomplex, thereby recovering the hairpin-template complex; (d) treatingthe hairpin-template complex with sodium bisulfite, thereby producing asodium bisulfite-treated template nucleic acid; (e) sequencing thesodium bisulfite-treated template nucleic acid of (c), thereby producinga second sequence; and (f) comparing the first sequence and the secondsequence, where the presence of a cytosine in the second sequenceindicates that the cytosine at that position is methylated; therebydetecting a methylated cytosine in the template nucleic acid. Thehairpin nucleic acid can be attached to a solid substrate.

The invention also features an addressable single molecule arrayincluding a hairpin-template complex, including: (a) a hairpin nucleicacid, where the hairpin nucleic acid is self-complementary and has afirst restriction site for a nicking endonuclease, the restriction siteincluding a recognition sequence and a cleavage site, where therecognition sequence is situated so that the cleavage site is before,at, or beyond the 3′ end of the hairpin nucleic acid, and where thehairpin nucleic acid is a self-hybrid, and where the hairpin nucleicacid is attached to a solid substrate; and (b) a single-strandedtemplate nucleic acid, where the 5′ end of the hairpin nucleic acid isattached to the 3′ end of the single-stranded template nucleic acid.Such a single molecule addressable array can include a plurality of thehairpin-template complexes, where adjacent complexes are separated by adistance of at least 10 nm, at least 100 nm, or at least 250 nm. Theaddressable array can include complexes at a density of 10⁶ to 10⁹polynucleotides per cm², or 10⁷ to 10⁸ molecules per cm².

The invention also features a kit that includes such addressable arrays.

In a further aspect, the invention features a method for detecting amethylated cytosine in a template nucleic acid, the method including:(a) providing an anchor-template complex, including: (i) adouble-stranded nucleic acid anchor, where the double-stranded nucleicacid anchor includes: (A) a first end and a second end; and (B) a firstrestriction site for a nicking endonuclease, the restriction siteincluding a recognition sequence and a cleavage site, where the cleavagesite is situated so that the cleavage site is before, at, or beyond the3′ end of the first end of the double-stranded nucleic acid anchor; and(ii) a single-stranded template nucleic acid; where the 5′ end of thefirst end of the double-stranded nucleic acid anchor is attached to the3′ end of the single-stranded template nucleic acid; (b) sequencing thesingle-stranded template nucleic acid of the anchor-template complex,thereby producing: (i) a first sequence; and (ii) ananchor-template-complement complex, including the anchor-templatecomplex of (a), and further including a synthetic nucleic acid strandcomplementary to the template nucleic acid, where the synthetic nucleicacid strand is hybridized to the template nucleic acid, and where thecomplementary nucleic acid strand is attached at its 5′ end to the 3′end of the first end of the double-stranded nucleic acid anchor; (c)removing the complementary nucleic acid strand from theanchor-template-complement complex, thereby recovering theanchor-template complex; (d) treating the anchor-template complex withsodium bisulfite, thereby producing a sodium bisulfite-treatedanchor-template complex; (e) sequencing the sodium bisulfite-treatedanchor-template complex of (d), thereby producing a second sequence; and(f) comparing the first sequence and the second sequence, where thepresence of a cytosine in the second sequence indicates that thecytosine at that position in the template nucleic acid is methylated;thereby detecting a methylated cytosine in the template nucleic acid.The double-stranded nucleic acid anchor can be attached at its secondend to a solid substrate.

The invention additionally features an addressable single molecule arrayincluding an anchor-template complex, including: (a) a double-strandednucleic acid anchor, where the double-stranded nucleic acid anchorincludes: (i) a first end and a second end; and (ii) a first restrictionsite for a nicking endonuclease, the restriction site including arecognition sequence and a cleavage site, where the cleavage site issituated so that the cleavage site is before, at, or beyond the 3′ endof the first end of the double-stranded nucleic acid anchor; and (b) asingle-stranded template nucleic acid; where the 5′ end of the first endof the double-stranded nucleic acid anchor is attached to the 3′ end ofthe single-stranded template nucleic acid. Such an addressable singlemolecule array can include a plurality of the anchor-template complexes,where adjacent complexes are separated by a distance of at least 10 nm,at least 100 nm, or at least 250 nm. The addressable array can containcomplexes in a density of 10⁶ to 10⁹ polynucleotides per cm², or 10⁷ to10⁸ molecules per cm².

The invention also features a kit including such an addressable array.

In another aspect, the invention features a method for detecting amethylated cytosine in a template nucleic acid of known sequence, themethod including: (a) providing a hairpin-template complex, including:(i) a hairpin nucleic acid, where the hairpin nucleic acid isself-complementary and has a first restriction site for a nickingendonuclease, the restriction site including a recognition sequence anda cleavage site, where the recognition sequence is situated so that thecleavage site is before, at, or beyond the 3′ end of the hairpin nucleicacid, and where the hairpin nucleic acid is a self-hybrid; and (ii) asingle-stranded template nucleic acid; where 5′ end of the hairpinnucleic acid is attached to the 3′ end of the single-stranded templatenucleic acid; (b) treating the hairpin-template complex with sodiumbisulfite, thereby producing a sodium bisulfite-treated template nucleicacid; (c) sequencing the sodium bisulfite-treated template nucleic acidof (b), thereby producing a sequence; and (d) comparing the sequence of(c) and the known sequence, where the presence of a cytosine in thesequence of (c) indicates that the cytosine at that position ismethylated; thereby detecting a methylated cytosine in the templatenucleic acid of known sequence. The hairpin nucleic acid can be attachedto a solid substrate.

The invention further features a method for detecting a methylatedcytosine in a template nucleic acid of known sequence, the methodincluding: (a) providing an anchor-template complex, including: (i) adouble-stranded nucleic acid anchor, where the double-stranded nucleicacid anchor includes: (A) a first end and a second end; and (B) a firstrestriction site for a nicking endonuclease, the restriction siteincluding a recognition sequence and a cleavage site, where the cleavagesite is situated so that the cleavage site is before, at, or beyond the3′ end of the first end of the double-stranded nucleic acid anchor; and(ii) a single-stranded template nucleic acid; where the 5′ end of thefirst end of the double-stranded nucleic acid anchor is attached to the3′ end of the single-stranded template nucleic acid; (b) treating theanchor-template complex with sodium bisulfite, thereby producing asodium bisulfite-treated anchor-template complex; (c) sequencing thesodium bisulfite-treated anchor-template complex of (b), therebyproducing a sequence; and (d) comparing the sequence of (c) and theknown sequence, where the presence of a cytosine in the sequence of (c)indicates that the cytosine at that position in the template nucleicacid is methylated; thereby detecting a methylated cytosine in thetemplate nucleic acid. The double-stranded nucleic acid anchor can beattached at its second end to a solid substrate.

The invention also features a method for detecting a methylated cytosinein a template nucleic acid of known sequence, where one or more of thecytosines in the template nucleic acid have been converted to uracil,the method including: (a) providing a hairpin-template complex,including: (i) a hairpin nucleic acid, where the hairpin nucleic acid isself-complementary and has a first restriction site for a nickingendonuclease, the restriction site including a recognition sequence anda cleavage site, where the recognition sequence is situated so that thecleavage site is before, at, or beyond the 3′ end of the hairpin nucleicacid, and where the hairpin nucleic acid is a self-hybrid; and (ii) asingle-stranded template nucleic acid; where 5′ end of the hairpinnucleic acid is attached to the 3′ end of the single-stranded templatenucleic acid; (b) sequencing the template nucleic acid, therebyproducing a sequence; and (c) comparing the sequence of (b) and theknown sequence, where the presence of a cytosine in the sequence of (b)indicates that the cytosine at that position is methylated; therebydetecting a methylated cytosine in the template nucleic acid of knownsequence. The hairpin nucleic acid can be attached to a solid substrate.

The invention features in an additional aspect a method for detecting amethylated cytosine in a template nucleic acid of known sequence, whereone or more of the cytosines in the template nucleic acid have beenconverted to uracil, the method including: (a) providing ananchor-template complex, including: (i) a double-stranded nucleic acidanchor, where the double-stranded nucleic acid anchor includes: (A) afirst end and a second end; and (B) a first restriction site for anicking endonuclease, the restriction site including a recognitionsequence and a cleavage site, where the cleavage site is situated sothat the cleavage site is before, at, or beyond the 3′ end of the firstend of the double-stranded nucleic acid anchor; and (ii) asingle-stranded template nucleic acid; where the 5′ end of the first endof the double-stranded nucleic acid anchor is attached to the 3′ end ofthe single-stranded template nucleic acid; (b) sequencing theanchor-template complex, thereby producing a sequence; and (c) comparingthe sequence of (b) and the known sequence, where the presence of acytosine in the sequence of (b) indicates that the cytosine at thatposition in the template nucleic acid is methylated; thereby detecting amethylated cytosine in the template nucleic acid. The double-strandednucleic acid anchor can be attached at its second end to a solidsubstrate.

The invention features a hairpin nucleic acid, having the followingcharacteristics: (a) being self-complementary; and (b) having a firstrestriction site for a nicking endonuclease, the restriction siteincluding a recognition sequence and a cleavage site, where therecognition sequence is situated so that the cleavage site is before,at, or beyond the 3′ end of the hairpin nucleic acid. The hairpinnucleic acid can further include one or more modifications to allowhairpin nucleic acid attachment to a solid substrate. The hairpinnucleic acid can also further include a second restriction site for ablunt-end endonuclease, the second restriction site including a secondrecognition sequence and a second cleavage site, where the secondrecognition sequence is situated so that the second cleavage site isbefore, at, or beyond the 3′ end of the hairpin nucleic acid.

The invention also features a method for recovering a single-strandedtemplate nucleic acid, the method including: (a) providing asingle-stranded template nucleic acid attached to the 5′ end of ahairpin nucleic acid, where the hairpin nucleic acid isself-complementary and has a first restriction site for a nickingendonuclease, the restriction site including a recognition sequence anda cleavage site, where the recognition sequence is situated so that thecleavage site is before, at, or beyond the 3′ end of the hairpin nucleicacid, and where the hairpin nucleic acid is a self-hybrid, and where anucleic acid strand complementary to the template nucleic acid isattached to the 3′ end of the hairpin nucleic acid; (b) contacting thehairpin nucleic acid with the nicking endonuclease, under conditionswhere the nicking endonuclease cleaves before, at or beyond the 3′ endof the hairpin nucleic acid, thereby providing a nickedhairpin-template-complement nucleic acid complex; and (c) subjecting thenicked hairpin-template-complement nucleic acid complex to conditionswhereby the nucleic acid strand complementary to the template nucleicacid dissociates from the template nucleic acid; thereby recovering thesingle-stranded template nucleic acid. The hairpin nucleic acid can beattached to a solid substrate.

In another aspect, the invention features an addressable single moleculearray, including a hairpin nucleic acid as described above, where thehairpin nucleic acid is attached to a solid substrate. Adjacent hairpinnucleic acids in such an array can be separated by a distance of atleast 10 nm, of at least 100 nm, or of at least 250 nm. The density ofthe hairpin nucleic acids can be from 10⁶ to 10⁹ polynucleotides percm², or from 10⁷ to 10⁸ molecules per cm².

The invention also features a kit including a hairpin nucleic acid asdescribed above, and packaging components therefor. The invention alsofeatures a kit which includes an addressable array as described above.

In another aspect, the invention features a double-stranded nucleic acidanchor, having the following characteristics: (a) having a first end anda second end; and (b) having a first restriction site for a nickingendonuclease, the restriction site including a recognition sequence anda cleavage site, where the recognition sequence is situated so that thecleavage site is located before, at, or beyond the 3′ end of the firstend of the double-stranded nucleic acid anchor. The double-strandednucleic acid anchor can be attached at its second end to a solidsubstrate. The double-stranded nucleic acid anchor can further include asecond restriction site for a blunt-end endonuclease, the secondrestriction site including a second recognition sequence and a secondcleavage site, where the second recognition sequence is situated so thatthe second cleavage site is located before, at, or beyond the 3′ end ofthe first end of the double-stranded nucleic acid anchor.

The invention also features a method for recovering a single-strandedtemplate nucleic acid, the method including: (a) providing asingle-stranded template nucleic acid attached to a double-strandednucleic acid anchor, and where a nucleic acid strand complementary tothe template nucleic acid is attached to the double-stranded nucleicacid anchor, and where the double-stranded nucleic acid anchor: (i) hasa first end and a second end; and (ii) has a first restriction site fora nicking endonuclease, the restriction site including a recognitionsequence and a cleavage site, where the cleavage site is situated sothat the cleavage site is before, at, or beyond the 3′ end of the firstend of the double-stranded nucleic acid anchor, where thesingle-stranded template nucleic acid is attached to the 5′ end of thefirst end of the double-stranded nucleic acid anchor, and where thenucleic acid strand complementary to the template nucleic acid isattached to the 3′ end of the first end of the double-stranded nucleicacid anchor; (b) contacting the double-stranded nucleic acid anchor withthe nicking endonuclease, under conditions where the nickingendonuclease cleaves before, at, or beyond the 3′ end of the first endof the double-stranded nucleic acid anchor, thereby providing a nickedanchor-template-complement nucleic acid complex; and (c) subjecting thenicked anchor-template-complement nucleic acid complex to conditionswhereby the nucleic acid strand complementary to the template nucleicacid dissociates from the template nucleic acid; thereby recovering thesingle-stranded template nucleic acid. The double-stranded nucleic acidanchor can be attached at its second end to a solid substrate.

In another aspect, the invention features an addressable single moleculearray, including a double-stranded nucleic acid anchor as describedabove, where the double-stranded nucleic acid anchor is attached to asolid substrate. Adjacent double-stranded nucleic acid anchors in suchan array can be separated by a distance of at least 10 nm, of at least100 nm, or of at least 250 nm. The density of the double-strandednucleic acid anchors can be from 10⁶ to 10⁹ polynucleotides per cm², orfrom 10⁷ to 10⁸ molecules per cm².

The invention also features a kit including a double-stranded nucleicacid anchor as described above, and packaging components therefor. Theinvention also features a kit which includes an addressable array asdescribed above.

By “methylated cytosine” is meant a cytosine with an added methyl groupon the carbon 5 position.

“First sequence” and “second sequence”, as used herein, refer to theinformation regarding the sequential nucleotides in a nucleic acidsequence, presented in text, computer-readable, or other non-biologicalform, that is, the terms refer to the sequence information, rather thanto the physical nucleic acids themselves. By “first” and “second”sequences is meant the results of a first sequencing reaction and asecond sequencing reaction. The results of the two sequencing reactions(the first and second sequences, respectively), are then compared.

By “comparing the first sequence and the second sequence” is meant thatthe sequential nucleotide information resulting from the firstsequencing reaction is compared to the sequential nucleotide informationresulting from the second sequencing reaction, and the differencesbetween the two are noted. In the case where the template strand issequenced, and then treated with sodium bisulfite (thereby convertingthe unmethylated cytosines to uracils), the presence of a cytosine at aparticular location in the first sequence and a cytosine in the samelocation in the second sequence indicates that that particular cytosineis methylated in the original template nucleic acid. The presence of acytosine at a particular location in the first sequence and the presenceof a uracil in the same location in the second sequence indicates thatthat particular cytosine is a unmethylated in the original templatenucleic acid.

“By “treating the hairpin-template-complex with sodium bisulfite” ismeant that the hairpin-template-complex is contacted with an amount ofsodium bisulfite under conditions whereby the unmethylated cytosines inthe template nucleic acid will be chemically modified and converted touracils. The actual protocol for treating the template nucleic acid withsodium bisulfite can be any of those known in the art, or as providedherein.

Alternatively, other methods of differentiating between the two can beused, e.g., a chemical (or other) treatment that reliably convertseither the cytosines or the methylated cytosines to another, specificnucleotide can be used, and the differences between the results of thetwo sequencing reactions can be compared. For instance, a method ofchemical modification can be used which converts cytosine to a differentnucleotide, and the differences in the results of two sequencingreactions can be compared. Alternatively, a method of chemicalmodification can be used which converts methyl-cytosine to a differentnucleotide, and the differences in the results of two sequencingreactions can be compared.

The method can also be used to detect the presence of other modifiednucleotides in a nucleic acid, given a method (chemical or otherwise,e.g., enzymatic, etc.) of specifically treating the modified nucleotidesso that a subsequent sequencing reaction produces a sequence that ischanged relative to the first sequencing reaction.

In one embodiment, “hairpin nucleic acid” means a single-strandednucleic acid which is capable of forming a hairpin, that is, a nucleicacid whose sequence contains a region of internal self-complementarityenabling the formation of an intramolecular duplex or self-hybrid.“Region of self-complementarity” refers to self-complementarity over aregion of 4 to 100 base pairs. When not self-hybridized, the hairpinnucleic acid can be 8 to 200 base pairs, preferably 10 to 30 base pairsin length. By saying that the hairpin nucleic acid is a “self-hybrid”,or that the hairpin nucleic acid has “self-hybridized”, means that thehairpin nucleic acid has been exposed to conditions that allow itsregions of self-complementarity to hybridize to each other, forming adouble-stranded nucleic acid with a loop structure at one end and anexposed 3′ and 5′ end at the other. It is preferable, but not required,that when hybridized to itself, the exposed 3′ and 5′ ends form a bluntend.

The hairpin nucleic acid can also possess one or more moieties whichallow the hairpin nucleic acid to be attached to a solid substrate.Generally, such moieties will be located together in the vicinity of thecenter of the hairpin nucleic acid, so that when the hairpin nucleicacid has self-annealed, the moiety is located at the bend in thehairpin, allowing the bend to be attached to a solid substrate. Thehairpin can be self-hybridized before or after attachment to thesubstrate.

In one embodiment, the hairpin nucleic acid is a molecular stem and loopstructure formed from the hybridisation of complementarypolynucleotides. The stem comprises the hybridized polynucleotides andthe loop is the region that covalently links the two complementarypolynucleotides. Anything from a 4 to 100 base pair double-stranded(duplex) region may be used to form the stem.

In another embodiment, the hairpin nucleic acid is a molecule which issynthesized in a contiguous fashion but is not made up entirely of DNA,rather the ends of the molecule comprise DNA bases that areself-complementary and can thus form an intramolecular duplex, while themiddle of the molecule includes one or more non-nucleic acid molecules.An example of such a hairpin nucleic acid would beNu-Nu-Nu-Nu-Nu-LM-Nc-Nc-Nc-Nc-Nc, where “Nu” is a particular nucleotide,“Nc” is the nucleotide complementary to Nu, and “LM” is the linkermoiety linking the two strands, e.g., hexaethylene glycol (HEG) orpolyethylene glycol (PEG). The non-nucleic acid molecule(s) can belinker moieties for linking the two nucleic acids together (the twonucleic acid halves of the overall hairpin nucleic acid), and can alsobe used to attach the overall hairpin nucleic acid to the substrate.Alternatively, the non-nucleic acid molecule(s) can be intermediatemolecules which are in turn attached to linker moieties used forattaching the overall hairpin nucleic acid to the solid substrate.

In another embodiment, the hairpin nucleic acid is composed of twoseparate but complementary nucleic acid strands that are hybridizedtogether to form an intermolecular duplex, and are then covalentlylinked together. The linkage can be accomplished by chemicalcrosslinking of the two strands, attaching both strands to one or moreintercalators or chemical crosslinkers, etc.

By “double-stranded nucleic acid anchor”, or “anchor”, is meant asegment of double-stranded nucleic acid which, like the hairpin nucleicacid described above, is designed to contain one or more restrictionsites capable of being acted on by one or more restrictionendonucleases, e.g., a nicking endonuclease. The double-stranded nucleicacid anchor will have a first end and a second end. The first end isused for attachment of the template nucleic acid and the strandcomplementary to the template nucleic acid. The second end of thedouble-stranded nucleic acid anchor can possess one or more nucleotideswhich are modified to allow the double-stranded nucleic acid anchor tobe attached to a solid substrate. Because the anchor is double-stranded,both the first end and the second end will each have a strand with a 3′end, and a strand with a 5′ end. The anchor can be a double-strandedoligonucleotide bonded to the substrate, or two single-strandedoligonucleotides bonded to the substrate and than hybridized.

Thus, the terms “hairpin,” “hairpin nucleic acid,” and “double-strandednucleic acid anchor” include cross-linked (e.g., hybridized, chemicallycross-linked, etc.) duplex nucleic acids or nucleic acid mimics (e.g.,peptide nucleic acids (PNA)) which are capable of being recognized andacted upon by endonucleotides and polymerses.

The hairpin nucleic acids and double-stranded nucleic acid anchorsgenerally exist as molecules in solution before being attached to thesolid substrate. In the case of hairpin nucleic acids, the hairpinnucleic acid can be hybridized to itself before or after it is attachedto the substrate. In the case of double-stranded nucleic acid anchors,the two nucleic acid strands of the anchor can be hybridized together,and the anchor then attached to the substrate, or the individual singlestranded components of the anchor can be attached to the surface, andthen hybridized together.

The hairpin nucleic acids and double-stranded nucleic acid anchors(whether self-byridized or not) can be attached to the substrate in anyway known in the art. Generally, such methods involve modifying thenucleic acid such that it contains a chemical group or biochemical orother molecule (e.g., biotin or streptavidin, etc.) that is eitherinherently reactive with the substrate or can be activated to bond tothe substrate. Modifications can be made to any part of the nucleicacid, including linkers being attached to the bases, sugars, phosphates,or at the 3′ and 5′ hydroxyl groups. Modification can be made at anypart of the hairpin nucleic acid or double-stranded nucleic acid anchorto achieve surface attachment.

By saying that an endonuclease cuts “before, at or beyond the 3′ end” ofa hairpin nucleic acid, means that the “restriction site” for a givenendonuclease comprises both a “recognition sequence” and a “cleavagesite”. The recognition sequence is the precise sequence of nucleotidesrecognized by a particular endonuclease, e.g., the recognition sequencefor nicking endonuclease N.BbvCIA is “GCTGAGG” (see Table 1). Thecleavage site for this endonuclease is within this recognition sequence,between the “C” and the “T”. The recognition sequence for N.BstNBI is“GAGTCNNNN”, where “N” can be any nucleotide. The precise recognitionsequence is therefore effectively “GAGTC”. The cleavage site for thisendonuclease is four nucleotides 3′ from the end of this recognitionsequence.

There is no requirement that the restriction site be situated so thatthe endonuclease cuts or nicks exactly at the 3′ end of the hairpinnucleic acid. The cleavage site can lie within the hairpin nucleic acid,lie at the very end of the hairpin nucleic acid, or lie outside of it.

There exist nicking endonucleases that nick (cleave) at a position 3′ ofthe recognition sequence, that is, the recognition sequence and thecleavage site are separated by several (e.g., 4-5) nucleotides. Suchnicking endonucleases include N.AlwI, N.BspD6I, N.Bst9I, N.BstBI,N.BstSEI, where four random nucleotides separate the recognitionsequence and the cleavage site, and N.MlyI, where five randomnucleotides separate the recognition sequence and the cleavage site.

There is also no requirement that the recognition sequence be separatedfrom the cleavage site. As shown in Table 1, there exist nickingendonucleases that cut (cleave) within their recognition sequence (e.g.,N.BbvCIA, N.BbvCIB, N.Bpu10IA, N.Bpu10IB, N.CviPII, N.CviQXI), similarto the action of an ordinary restriction endonuclease (ie., an enzymethat cleaves through both strands of a double stranded nucleic acid).

By saying that an endonuclease cuts “before” the 3′ end of a hairpinnucleic acid means that the cleavage site for a particular endonucleaseoccurs before the 3′ end of the hairpin nucleic acid, and thatnucleotides will be removed from the 3′ end of the hairpin nucleic acid.For instance, in the case of endonuclease N.BbvCIA, the placement of therecognition sequence for this endonuclease within a hairpin nucleic acidmeans that this endonuclease will, by definition, cleave at a pointbefore the 3′ end of the hairpin nucleic acid.

By saying that an endonuclease cuts “at” the 3′ end of a hairpin nucleicacid means that the cleavage site is situated so that the endonucleasecleaves at a point exactly between the 3′ end of the hairpin nucleicacid and any nucleotides or nucleic acid strand added to it. Forinstance, in the case of N.BstNBI, the restriction site is “GAGTCNNNNˆ”.A hairpin nucleic acid that ends in the sequence . . . GAGTCATGC-3′ willbe cut exactly at its 3′ end by N.BstNBI, thereby removing anynucleotides incorporated onto the end of the hairpin.

By saying that an endonuclease cuts “beyond” the 3′ end of a hairpinnucleic acid means that the cleavage site of the endonuclease cleaves ata point beyond the 3′ end of the hairpin, between nucleotides that havebeen added to the hairpin. For instance, if a hairpin nucleic acid endsin the sequence . . . GAGTC-3′, and has a strand attached to it thatbegins with 5′-AATTGGCC . . . , then the endonuclease N.BstNBI will cutbetween T and G of the attached strand, that is, at GAGTC AATTˆGGCC.

If the recognition sequence in the hairpin nucleic acid is that of anicking endonuclease that cleaves within its recognition sequence, theinclusion of such a recognition sequence in a hairpin nucleic acid willresult in the removal of several nucleotides (i.e., two in the case ofN.CviPII, N.CviQXI; five in the case of N.BbvCIA, N.BbvCIB, N.Bpu10IA,N.Bpu10IB) from the 3′ end of the hairpin. Depending on the intended useof the hairpin nucleic acid, such a loss may be acceptable, as afterremoval of the complementary strand, the limited number of nucleotidesremoved from the hairpin nucleic acid can be added back by using thesame reaction as that used to build up the complementary strand in thefirst place.

Some enzymes may not be useful for all applications. For instance,N.CviPII and N.CviQXI have very short recognition sequences(CˆCD andRˆAG, respectively), which nick frequently, and may therefore nick thetemplate itself. If the template is short, and does not contain thesesequences, then these enzymes may be useful.

There is no requirement that the restriction site be situated so thatthe endonuclease cuts or nicks exactly at the 3′ end of the first end ofthe double-stranded nucleic acid anchor. The endonuclease can cut ornick just before the 3′ end, if it is not necessary that perfectintegrity of the double-stranded nucleic acid anchor be maintained. Theendonuclease can also cut or nick beyond the 3′ end of thedouble-stranded nucleic acid anchor, if it is not detrimental thatnucleotides be effectively added to the anchor.

If the recognition sequence in the hairpin nucleic acid is that of anicking endonuclease that cleaves beyond the recognition sequence, theinclusion of such a recognition sequence in a hairpin nucleic acid willresult in nicking of the strand at a location a few nucleotides beyondthe recognition sequence. If the recognition sequence is located at the3′ end of the hairpin nucleic acid, then cleavage will occur 4-5nucleotides beyond the end of the hairpin nucleic acid. If, however, the3′ end of the recognition sequence for any of N.AlwI, N.BspD6I, N.Bst9I,N.BstNBI and N.BstSEI is located four nucleotides from the end of thehairpin nucleic acid, then these enzymes will cut exactly at the end ofthe hairpin nucleic acid. If, however, the 3′ end of the recognitionsequence for any of these enzymes is located more than four nucleotidesfrom the 3′ end of the hairpin nucleic acid, then the nickingendonuclease will nick before the 3′ end of the hairpin.

The endonuclease can cut or nick just before the 3′ end of the hairpin,if it is not necessary that perfect integrity of the hairpin bemaintained. The endonuclease can also cut or nick beyond the 3′ end ofthe hairpin nucleic acid, if it is not detrimental that nucleotides beeffectively added to the hairpin.

According to the invention, a hairpin nucleic acid is designed so thatthe restriction site for a nicking endonuclease is located so that theendonuclease will nick at a location before, at, or beyond the 3′ end ofthe hairpin. The hairpin is then self-annealed and a single-strandedtemplate nucleic acid is attached to the 5′ end of the hairpin. After asequencing or other reaction builds a synthetic strand complementary tothe template nucleic acid, the synthetic complementary strand can beremoved by (1) nicking with the nicking endonuclease that recognizes therestriction site within the hairpin, so that a nick is made at a pointbefore, at or beyond the 3′ end of the hairpin, effectively“disconnecting” the synthetic complementary strand from the hairpin, sothat the two are no longer contiguous, and (2) washing away thesynthetic complementary strand, by standard denaturation, e.g., heat,formamide, NaOH, etc.

Practice of the method of the invention with a double-stranded nucleicacid anchor is very similar to using a hairpin nucleic acid. The presentapplication largely discusses use of hairpin nucleic acids in theinvention, however, one of ordinary skill will readily understand thatthe double-stranded nucleic acid anchors can perform all of the samefunctions, and possess the same advantages over previous methods, as thehairpin nucleic acids.

It is to be understood that in stating that the cut made by theendonuclease is “before, at, or beyond” the 3′ end of the hairpin, it ismeant that the cut is made in the vicinity of the 3′ end of the hairpin,and that the recognition sequence for the endonuclease is not located atthe 5′ end of the hairpin nucleic acid resulting in cleavage within the5′ half of the hairpin nucleic acid. It is also understood that bysaying that the cut may be made “beyond” the 3′ end of the hairpinnucleic acid, the distance beyond the 3′ end is constrained by thedistance between the recognition sequence and cleavage site for thegiven endonuclease. For instance, of the nicking endonucleases in Table1, none nicks at a point farther than five nucleotides from therecognition sequence. Therefore, no cleavage will occur farther thanfive nucleotides beyond the end of the 3′ end of the hairpin nucleicacid, unless endonucleases are used which have cleavage sites that arefurther removed from their recognition sequences.

The hairpin nucleic acid or the double-stranded nucleic acid anchor canbe attached to a substrate, e.g., in a spatially-addressable array.

“Template nucleic acid,” or “single-stranded template nucleic acid,” asused herein, means a linear single-stranded nucleic acid molecule which,when attached to the self-annealed hairpin nucleic acid (or anchor)described herein, is capable of being recognized and acted upon by apolymerase such that, under the proper conditions, the polymeraseincorporates nucleotides onto the 3′ end of the hairpin nucleic acid,where each nucleotide is complementary to the corresponding nucleotideon the template nucleic acid, thereby extending the 3′ end of thehairpin and producing a nucleic acid strand complementary to thetemplate nucleic acid. The term also includes a double-stranded nucleicacid that is attached to the hairpin, where one strand is then removed,leaving a single strand. The term can also include the ligation andcovalent attachment of both strands of a double-stranded nucleic acid tothe hairpin nucleic acid or double-stranded nucleic acid anchor,followed by nicking according to the methods described herein followedby washing to remove the nicked strand, that is, the method of theinvention can itself be used in the attachment of the template nucleicacid to the hairpin nucleic acid or the double-stranded nucleic acidanchor. Alternatively, one strand of a double-stranded nucleic acid canbe ligated to the hairpin nucleic acid or double-stranded nucleic acidanchor, and the second strand washed away.

The template can be any length that can be successfully sequenced,preferably 10 to 100 nucleotides, more preferably 15 to 100 nucleotides,most preferably 20 to 30 nucleotides. Although the term “templatenucleic acid” is used herein, it will be appreciated by one of ordinaryskill that the invention is not limited to sequencing reactions, butthat the techniques can be used to assay the interaction of the“templates” with other molecules. Such embodiments are described below.

By stating that the template is “attached” to the hairpin or anchor ismeant that the template nucleic acid is covalently attached.

By stating that the polymerase will act upon the template andincorporate nucleotides onto the 3′ end of the hairpin is meant that thepolymerase will act given appropriate conditions, such as appropriatetemperature, buffers, pH, nucleotides, and other reaction components andconditions required for action by the polymerase.

By “nucleic acid strand complementary to the template nucleic acid”, or“synthetic nucleic acid strand complementary to the template nucleicacid”, or more simply, “complement”, is meant a strand of nucleic acidwhich possesses a sequence that is complementary to that of the templatenucleic acid, that is, the complement and the template nucleic acids canhybridize and form a stretch of double-stranded nucleic acid.

By stating that the template or complement is “attached” to the hairpinor anchor is meant that the template nucleic acid or its complement arecovalently attached.

As used herein, the term “array” refers to a population of hairpinnucleic acids or double-stranded nucleic acid anchors that aredistributed over a solid support. The nucleic acids can be distributedin a single molecule array, that is the nucleic acids are spaced at adistance from one another sufficient to permit their individualresolution. Alternatively, nucleic acids of one type can be clustered ata single address, when one or more nucleic acids at the address can bedetected.

“Solid support”, as used herein, refers to the material to which thehairpins and/or anchors are attached. Suitable solid supports areavailable commercially, and will be apparent to the skilled person. Thesupports can be manufactured from materials such as glass, ceramics,silica and silicon. Supports with a gold surface may also be used. Thesupports usually comprise a flat (planar) surface, or at least astructure in which the molecules to be interrogated are in approximatelythe same plane. Alternatively, the solid support can be non-planar,e.g., a microbead. Any suitable size may be used. For example, thesupports might be on the order of 1-10 cm in each direction.

In one aspect of the invention, the “array” is a device comprising a“single molecule array,” that is, a plurality of the hairpins and/oranchors of the invention, i.e., the hairpin and/or anchor molecules, areimmobilized on the surface of a solid support, such that the moleculesare at a density that permits individual resolution of at least two ofthe molecules and their attached templates. “Plurality” is used to meanthat multiple molecules are placed on the array. The molecules can be ofall the same type, or of multiple, i.e., different, types, i.e., thearray can be composed entirely of hairpins, or entirely of anchors, orof a mixture of the two. In general, the hairpins/anchors are at adensity of 10⁶ to 10⁹ individually resolvable polynucleotides per cm²,preferably 10⁷ to 10⁹ individually resolvable polynucleotides per cm².

In another aspect of the invention, the “array” is a device comprising ahigh-density array, that is, where each individual address on the arraycomprises a cluster of nucleotides of the same type, while anotheraddress on the array comprises a cluster of nucleotides of a differenttype. Detection of an address is done by detecting one or moreindividual nucleotides at the address.

As used herein, the term “interrogate” means contacting one or more ofthe hairpins and/or anchors with another molecule, e.g., a polymerase, anucleoside triphosphate, a complementary nucleic acid sequence, whereinthe physical interaction provides information regarding a characteristicof the arrayed molecule and the template nucleic acid attached to it.The contacting can involve covalent or non-covalent interactions withthe other molecule. As used herein, “information regarding acharacteristic” means information regarding the sequence of one or morenucleotides in the template, the length of the template, the basecomposition of the template, the T_(m) of the polynucleotide, thepresence of a specific binding site for a polypeptide or other molecule,the presence of an adduct or modified nucleotide, or thethree-dimensional structure of the template.

The term “individually resolved by optical microscopy” is used herein toindicate that, when visualized, it is possible to distinguish at leastone polynucleotide on the array from its neighbouring polynucleotidesusing optical microscopy methods available in the art. Visualisation maybe effected by the use of reporter labels, e.g., fluorophores, thesignal of which is individually resolved.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating a hairpin-template-complement complex,and the recovery and regeneration of the template nucleic acid.

FIG. 2 is a diagram illustrating the steps in sequencing a singlestranded nucleic acid template attached by a hairpin (or other anchoringsequence) to a substrate.

FIG. 3 is a diagram showing a hairpin containing a nicking site of thenicking endonuclease N.BstNBI.

FIG. 4 is a diagram showing a hairpin containing a cleavage site ofblunt end endonuclease MlyI.

FIG. 5 is a diagram showing a double-stranded nucleic acid anchorcontaining a nicking site of the nicking endonuclease N.BstNBI.

DETAILED DESCRIPTION

The present invention discloses a method of determining the presence andlocations of methylated cytosines in a template nucleic acid sequence.The method comprises the steps of sequencing a template nucleic acid,treating it with sodium bisulfite to convert unmethylated cytosines touracils, and then resequencing the template nucleic acid to determine atwhich positions methylated cytosines are present, that is, wherecytosines are not converted to uracils. The method uses a method forregenerating a single-stranded nucleic acid template following itsconversion to a double-stranded product, e.g., during a sequencingreaction. The invention also uses a method of removing a double-strandednucleic acid from its substrate, e.g., removing a double strandednucleic acid from another molecule anchoring it to a solid substrate, orfrom a hairpin nucleic acid anchoring the double stranded nucleic acidto a solid substrate.

Single-molecule sequencing allows complete genomes to be sequenced on asingle microarray chip in a single sequencing reaction. The principle ofthis technology is that large numbers of short sequences from fragmentedDNA are immobilized as single strands on a surface where they can beindividually visualized with a sensitive microscope and camera. Everyfragment is then sequenced simultaneously with fluorescent nucleotidesand a polymerase enzyme, and the sequence information from all of themolecules is recorded simultaneously within a single camera frame. Themethod does not rely on DNA amplification by PCR or any sub-cloningsteps, instead, tiny quantities of DNA can be directly sequencedimmediately after being extracted from source. When a sequencingreaction is complete, the single stranded template strand can beregenerated by enzymatic cleavage of the newly synthesized sequencingstrand as described herein. The DNA is then treated with sodiumbisulfite that converts unmethylated cytosines to uracils. If a secondsequencing reaction is then performed on the template, then thedetection of cytosines will indicate that those bases are methylated.

Unlike many other methylation detection techniques, the sodium bisulfitemethod does not rely on the presence of a restriction site nor any priorknowledge of the sequence context. Furthermore, as provided herein, thesingle-stranded nature of the template DNA avoids potential artifactsarising from the sodium bisulfite reaction, which are found in prior arttechniques. Sodium bisulfite will only react with pyrimidines that arenot base-paired. Various technical modifications to sodium bisulfitereactions have been attempted by others to reduce strand annealing, butless than complete conversion of unmethylated cytosines to uracils canstill occur resulting in incorrect interpretation of data.

As an alternative to such techniques, a pool of fragmented DNA can besplit into two portions and immobilized as single strands on separatemicroarrays. One array can be treated with bisulfite and then botharrays sequenced. A comparison of the sequence data from the two arrayswill indicate sites of methylation. This approach avoids the need toregenerate a sequencing template and requires only one sequencingreaction per microarray, although it requires the use of two microarraysand twice the amount of DNA.

Another alternative is to attach the template nucleic acids to hairpinnucleic acids or double-stranded nucleic acid anchors as describedherein, which permit the recovery and regeneration of the originalsingle-stranded template nucleic acid after it has been sequenced andconverted to a double-stranded product. After such regeneration andrecovery, the template nucleic acid can be treated with sodium bisulfiteand resequenced, producing the second set of results on the sametemplate nucleic acids on the same array.

The use of the methods described herein on a single-molecule array thusrepresents a technically simple procedure to assess methylation patternsacross an entire genome without prior knowledge of restriction sites andwithout the artifacts of conventional bisulfite methodologies.

To regenerate the template nucleic acid between the two sequencingreactions, a hairpin nucleic acid containing a restriction site isprovided, i.e., a single-stranded nucleic acid with a region of internalcomplementarity (ie., is capable of hybridizing to itself and forming ahairpin) and also containing a restriction site. The hairpin nucleicacid has, near its 3′ end, a restriction site for a nickingendonuclease. The restriction site is situated so that the nickingendonuclease will nick at a point before, at, or beyond the 3′ end ofthe single-stranded nucleic acid. A nicking endonuclease acting uponsuch a restriction site in such a nucleic acid is shown in FIG. 1.

To use the hairpin to recover a template nucleic acid, a single-strandednucleic acid template is attached to the 5′ end of the hairpin. This canbe done in a number of ways. A single-stranded nucleic acid can beattached to the hairpin. Alternatively, a double-stranded nucleic acidcan be attached to the hairpin. Alternatively, a double-stranded nucleicacid can be attached to the hairpin, and either one strand ligated tothe hairpin, or both strands can be ligated and then one strand removed,e.g., according to the methods described herein. The hairpin nucleicacid is then self-annealed to form a hairpin with an attached templatenucleic acid. Alternatively, the hairpin can be self-annealed first,with the single-stranded template nucleic acid being then being attachedto the hairpin. Once the template nucleic acid is attached to thehairpin, it is in a position to be “recovered” following a sequencing orother reaction that builds up a strand complementary to the templatenucleic acid, and attached to the 3′ end of the hairpin.

During such a reaction, such as that shown in FIG. 2, single nucleotidesare generally incorporated onto the 3′ end of the hairpin, where eachnucleotide is complementary to the nucleotide opposite it on thetemplate strand. The end result of such a reaction is that thesingle-stranded template nucleic acid is no longer single-stranded;instead, it is base-paired to a synthetic complementary strand. Theresult is a double-stranded nucleic acid molecule; the original templatenucleic acid and its synthetic complementary strand, attached to ahairpin nucleic acid.

The template nucleic acid can then be recovered according to theinvention, that is, the complementary strand can be removed bycontacting the double-stranded nucleic acid molecule plus hairpin with anicking endonuclease that is capable of recognizing the restriction sitethat is in the hairpin nucleic acid, near what was its original 3′ end.Because the restriction site is situated so that the nickingendonuclease will create a “nick” at a point near, at, or beyond theoriginal 3′ end of the hairpin nucleic acid, the nick will be madebefore, at, or just beyond, the junction between what was originally the3′ end of the hairpin, and the start of the strand complementary to thetemplate nucleic acid (see, e.g., FIG. 1).

When a nick is introduced, the sequence distal to the cleavage is nolonger contiguous with the sequence proximal to it. That is, the hairpinand the synthetic complementary strand are no longer contiguous. Rather,the synthetic complementary strand effectively becomes a separate,discrete single strand of nucleic acid that is hybridized to thetemplate nucleic acid. The synthetic complementary strand is thusamenable to being washed away by denaturing the overall nucleic acidcomplex by using heat or chaotropic conditions such as highconcentrations of salt. After the synthetic strand is washed away, thetemplate nucleic acid is still attached to the hairpin, and is availablefor re-sequencing.

Although one embodiment described above uses a hairpin containing asingle restriction site for a nicking endonuclease, the sequence of thehairpin can be designed to contain multiple restriction sites, e.g., fornicking endonucleases or other types of enzymes, such as blunt endendonucleases and/or ordinary restriction enzymes.

For instance, the hairpin can contain restriction sites for both anicking endonuclease and a blunt end endonuclease. With such a hairpin,one can choose to either recover the template by selectively removingthe synthetic complement, as described above, or by use of the blunt endendonuclease, to remove both the synthetic complement and the template,leaving only the hairpin.

The use of a ‘nicking’ class of enzyme to regenerate the template DNA onan arrayed surface, or a Type IIs endonuclease to regenerate a blunthairpin, is described. Both of these enzymes may share a commonrestriction site, or may use different restriction sites. Two of theenzymes discussed herein, N.BstNBI and MlyI, exemplify two enzymes thatshare a common restriction site. In this case, the two enzymes recognizethe same sequence of nucleotides, but actually leave at differentlocations. In the case of enzymes that do not share a common restrictionsite, the different restriction sites can be included in the design ofthe hairpin/anchor sequence.

The hairpin nucleic acids or double-stranded nucleic acid anchors can beused to recover the original template in an array, e.g., a device wheremultiple nucleic acid sequences are attached to a substrate, e.g., adevice in which fragments of nucleic acid, e.g., DNA, from a genome ofinterest are attached to the surface of a glass slide by ligation to aDNA hairpin.

An advantage of the ability to regenerate a template is that a secondand subsequent round of sequencing on the same template should eliminateany random sequencing errors that arose during the first round ofsequencing. The method is therefore useful in confirming sequencingdata.

In general, the hairpins and anchors are useful in situations where asingle-stranded nucleic acid template has been made double-stranded,e.g., in a sequencing reaction, and there is then a need to remove thecomplementary strand that was synthesized and attached to the template.

Such a sequencing method is illustrated in FIG. 2. The sequence of basesin a template strand is determined by employing a polymerase enzyme tosynthesize a complementary strand on the template strand one base at atime. FIG. 2 shows a substrate with a hairpin attached, and a templatestrand (with the nucleotides represented by circles and squares)attached to one of the ends of the hairpin. Individual bases are thenadded, each labeled with a different label, e.g., each with a differentfluorophore. One complementary base is attached to the end of thehairpin (or end of the growing synthetic strand) by incorporation, e.g.,by a polymerase, to the growing complementary strand. The identity ofthe complementary nucleotide is then determined by detection of thefluorophore, e.g., by washing away unincorporated labeled nucleotidesand subsequent detection of the attached fluorophore. The label is thencleaved off the recently-incorporated nucleotide, e.g., by chemicalmeans, and a nucleotide complementary to the next nucleotide in thetemplate is incorporated into the growing complementary strand, thelabel detected and identified, and then cleaved off. Subsequent cyclesof incorporation, detection and cleavage result in the sequencing of thecomplementary strand, and perforce, the deduction of the sequence of theoriginal template nucleic acid. FIG. 2 shows the template attached to ahairpin, but the template could alternatively be attached to a segmentof double-stranded nucleic acid, e.g., a double-stranded nucleic acidanchor.

After a series of such incorporations, the original template strand isno longer single stranded, instead, it is base-paired to a growingsynthetic complementary strand. Eventually, the template strand maybecome entirely double-stranded. The hairpins and anchors enable bothreuse of the device by recovery and further interrogation of thesequenced template nucleic acid by removal of the syntheticcomplementary strand, or regeneration of the blunt hairpins on the solidsubstrate.

The hairpin nucleic acid used to attach the single-stranded template tothe solid substrate has been designed such that it contains within itssequence a restriction site for a nicking endonuclease. A “nickingendonuclease” is one of a class of enzymes that bind reversibly to aspecific site in double-stranded nucleic acid and then cleave aphosphodiester bond in only one strand at a short distance from theenzyme's binding site. The result is a ‘nick’ in one strand of thedouble-stranded nucleic acid, rather than cleavage of both strands. Ingeneral, the nicks occur at the 3′-hydroxyl, 5′-phosphate. When a nickis produced in a section of double-stranded nucleic acid, the sequencedistal to the restriction site and cleavage site is no longer contiguouswith the main body of the double-stranded nucleic acid. It becomes, inessence, a single strand hybridized to the rest of the nucleic acid. Itcan therefore be washed away by denaturing the nucleic acid using heator by using chaotropic conditions such as high concentrations of urea.

Several enzymes are known to nick DNA in a single strand but most arefound in multiple protein complexes involved in DNA replication or inDNA repair, and as such, have before now had limited applications inmanipulating DNA in vitro. However, a number of these enzymes arecommercially available and can be used to nick DNA under simple reactionconditions. For example, N.BstNBI (available from New England Biolabs,Beverly, Mass., USA) has been used to prepare substrates for studiesinto DNA repair mechanisms. This and other such enzymes are shown inTable 1, below. A number are available commercially (e.g., N.AlwI,N.BstNBI, N.BbvCIA and N.BbvCIB are available from New England BioLabs,Inc., Beverly, Mass., USA). Information on enzymes and their cleavagesites can be found in the relevant scientific literature, and/or inpublic databases, e.g., REBASE (Robert et al., 2001, Nucl. Acids Res.29:268-269) (“rebase/”), which is maintained by New England Biolabs onits web site (“neb.com”). TABLE 1 Nicking endonucleases and theirrestriction sites. Restriction Site Enzyme (5′ to 3′) IsoschizomersN.AlwI GGATCNNNN{circumflex over ( )} N.BbvCIA GC{circumflex over( )}TGAGG N.BbvCIB CC{circumflex over ( )}TCAGC N.Bpu10IA GC{circumflexover ( )}TNAGG N.Bpu10IB CC{circumflex over ( )}TNAGC N.BspD6IGAGTCNNNN{circumflex over ( )} N.Bst9I N.BstNBI N.BstSEI N.MlyI N.Bst9IGAGTCNNNN{circumflex over ( )} N.BspD6I N.BstNBI N.BstSEI N.MlyIN.BstNBI GAGTCNNNN{circumflex over ( )} N.BspD6I N.Bst9I N.BstSEI N.MlyIN.BstSEI GAGTCNNNN{circumflex over ( )} N.BspD6I N.Bst9I N.BstNBI N.MlyIN.CviPII C{circumflex over ( )}CD N.CviQXI R{circumflex over ( )}AGN.MlyI GAGTCNNNNN{circumflex over ( )}

The position of the restriction site of the nicking endonuclease can bechosen so that the enzyme cleaves the synthetic complementary strandfrom the main body of the hairpin and genomic template stand. After thisdetached section is washed away, the template strand remains attached tothe hairpin and is available for re-sequencing or other applications.

N.BstNBI recognizes the asymmetric sequence GAGTC (SEQ ID NO: 1) indouble stranded DNA and nicks between the fourth and fifth basedownstream of this sequence in the same strand. As described herein,this restriction site has been incorporated into the 3′ end of DNAhairpins such that the N.BstNBI enzyme nicks the hairpin just upstreamof the synthetic complementary strand, thereby detaching it from thehairpin.

Such a hairpin is shown in FIG. 3. The linear sequence of the hairpin is5′-NNNNGACTC . . . (hairpin loop) . . . GAGTCNNNN-3′. The fournucleotides represented by “n” on the lower strand represent thesynthesized nucleotides complementary to the four template sequencenucleotides represented by “N” on the upper strand. The enzyme N.BstNBIwill nick the complementary strand at the position indicated by thearrow, thereby releasing the lower sequence “nnnn”.

The incorporation of this particular restriction site into the hairpinhas an added advantage in that it is also recognized by anotherendonuclease, MlyI. In contrast to N.BstNBI, this enzyme cleaves thehairpin in both strands between the fifth and sixth base downstream ofthe restriction site to produce a blunt end. Thus, the addition of thisenzyme following a sequencing reaction on a hairpin allows the originalblunt hairpin to be regenerated, as is shown in FIG. 4.

“Blunt end endonucleases” are those which hydrolyze both strands of anucleic acid, and do so without leaving an overhanging end. A number ofblunt end endonucleases are listed in Table 2, below. TABLE 2 Blunt endendonucleases (Type II). Restriction Site Enzyme (5′ to 3′)Isoschizomers AhaIII TTT{circumflex over ( )}AAA DraI PauAII SruI AluIAG{circumflex over ( )}CT MltI BalI TGG{circumflex over ( )}CCA MlsIMlu31I MluNI MscI Msp20I BfrBI ATG{circumflex over ( )}CAT BloHIICTGCA{circumflex over ( )}G BsaAI YAC{circumflex over ( )}GTR BstBAIMspYI PsuAI BsaBI GATNN{circumflex over ( )}NNATC Bse8I BseJI Bsh1365IBsiBI BsrBRI MamI BsrBI CCG{circumflex over ( )}CTC AccBSI BstD102IBst31NI MbiI BtrI CAC{circumflex over ( )}GTC BmgBI Cac8I GCN{circumflexover ( )}NGC BstC8I CviJI RG{circumflex over ( )}CY CviTI CviRITG{circumflex over ( )}CA HpyCH4V HpyF44III Eco47III AGC{circumflex over( )}GCT AfeI AitI Aor51HI FunI Eco78I GGC{circumflex over ( )}GCC EgeIEheI SfoI EcoICRI GAG{circumflex over ( )}CTC Ecl136II Eco53kI MxaIEcoRV GAT{circumflex over ( )}ATC CeqI Eco32I HjaI HpyCI NsiCI EsaBC3ITC{circumflex over ( )}GA FnuDII CG{circumflex over ( )}CG AccII BceBIBepI Bpu95I Bsh1236I Bsp50I Bsp123I BstFNI BstUI Bsu1532I BtkI Csp68KVICspKVI FalII FauBII MvnI ThaI FspAI RTGC{circumflex over ( )}GCAY HaeIWGG{circumflex over ( )}CCW HaeIII GG{circumflex over ( )}CC BanAIBecAII Bim19II Bme361I BseQI BshI BshFI Bsp21lI BspBRI BspKI BspRI BsuRIBteI CltI DsaII EsaBC4I FnuDI MchAII MfoAI NgoPII NspLKI PalI Pde133IPfLKI PlaI SbvI SfaI SuaI HindII GTY{circumflex over ( )}RAC HinJCIHincII HpaI GTT{circumflex over ( )}AAC BstEZ359I BstHPI KspAI SsrIHpy8I GTN{circumflex over ( )}NAC HpyBII LpnI RGC{circumflex over( )}GCY Bme142I MlyI GAGTCNNNNN{circumflex over ( )} SchI MslICAYNN{circumflex over ( )}NNRTG SmiMI MstI TGC{circumflex over ( )}GCAAcc16I AosI AviII FdiII FspI NsbI PamI Pun14627I NaeI GCC{circumflexover ( )}GGC CcoI PdiI SauBMKI SauHPI SauLPI SauNI SauSI Slu1777I NlaIVGGN{circumflex over ( )}NCC AspNI BscBI BspLI PspN4I NruI TCG{circumflexover ( )}CGA Bsp68I Mlu2I Sbo13I SpoI NspBII CMG{circumflex over ( )}CKGMspA1I OliI CACNN{circumflex over ( )}NNGTG AleI PmaCI CAC{circumflexover ( )}GTG AcvI BbrPI BcoAI Eco72I PmlI PmeI GTTT{circumflex over( )}AAAC MssI PshAI GACNN{circumflex over ( )}NNGTC BoxI BstPAI PsiITTA{circumflex over ( )}TAA PvuII CAG{circumflex over ( )}CTG BavI BavAIBavBI Bsp153AI BspM39I BspO4I Cfr6I DmaI EclI NmeRI Pae17kI Pun14627IIPvu84II RsaI GT{circumflex over ( )}AC AfaI HpyBI PlaAII ScaIAGT{circumflex over ( )}ACT Accl13I AssI DpaI Eco255I RflFII SciICTC{circumflex over ( )}GAG SmaI CCC{circumflex over ( )}GGG CfrJ4IPaeBI PspALI SnaBI TAC{circumflex over ( )}GTA BstSNI Eco 105I SrfIGCCC{circumflex over ( )}GGGC SspI AAT{circumflex over ( )}ATT SspD5IGGTGANNNNNNNN{circumflex over ( )} StuI AGG{circumflex over ( )}CCT AatIAspMI Ecol47I GdiI PceI Pme55I SarI Sru30DI SseBI SteI SwaIATTT{circumflex over ( )}AAAT BstRZ246I BstSWI MspSWI SmiI XcaIGTA{circumflex over ( )}TAC BspM90I BssNAI Bst1107I BstBSI BstZ17I XmnIGAANN{circumflex over ( )}NNTTC Asp700I BbvAI MroXI PdmI ZraIGAC{circumflex over ( )}GTC

It is to be understood that the enzymes used in the invention can bethose discovered in nature (i.e., naturally-occurring enzymes), or canbe enzymes created by mutation of existing enzymes.

The regeneration protocol is not restricted solely to arrays containinghairpin DNA molecules or DNA molecules constructed on hairpins (e.g.,ligated genomic DNA). Instead, the template can be attached to adouble-stranded nucleic acid “anchor” that incorporates the restrictionsite(s). Such an embodiment is shown in FIG. 5 for the N.BstNBI enzyme.

The hairpins and anchors can be used on double-stranded arrays formed byhybridization of complementary sequences to a single-stranded array, forexample, hybridization of a PCR product generated from primerscontaining a restriction site for a nicking enzyme. Furthermore, theprotocol can be applied to other types of arrays besides single-moleculearrays, i.e., arrays where multiple copies of the same DNA molecule arepresent at the same locus on the chip.

The hairpin/anchor can also be designed to include one or morerestriction sites for nicking endonucleases, blunt end endonucleases, orrestriction endonucleases.

For instance, the enzyme N.BstNBI recognizes the sequence 5′-GAGTC-3′,and acts by cleaving the strand between four and five nucleotides in the3′ direction from this sequence. This sequence can be incorporated intothe hairpin: 5′-NNNNGACTC...GAGTCNNNN-3′,where “ . . . ” represents a number of nucleotides or other moietiesadded to form the “loop” of the hairpin. Because a hairpin sequencecannot immediately turn upon itself, it is preferable to add 1 to 1000nucleotides that will form the curve of the loop between thecomplementary portions of the sequence, preferably 1 to 100 nucleotides.

The MlyI restriction site can be “added” to the above sequence by merelyadding an extra nucleotide: 5′-NNNNNGACTC...GAGTCNNNNN-3′.

This sequence would form the hairpin:             2

CTCAGNNNN N▾-5′

GAGTCNNNN▴N▴-3′           1 2where, when the sequence has formed a hairpin, the arrow “1” indicatesthe site of the nick made by N.BstNBI, and the arrow “2” indicates thesite on each “strand” that is cut by MlyI.

One can also make use of enzymes that do not recognize the same site.For instance, the blunt end endonuclease SspD5I recognizes the sequence5′-GGTGANNNNNNNNˆ-3′. this site can be added into the hairpin shownabove by overlapping the end of the SspD5I site with the N.BstNBI andMlyI sites:                2,3

CCACTCATNNNN N▾-5′

GGTGAGTCNNNN▴N▴-3′              1 2,3where the arrow “1” indicates the site of the nick made by N.BstNBI, andthe arrow “2,3” indicates the site on each “strand” that is cut byeither MlyI or SspD5I.

There is no requirement that the cleavage sites of one or more of theenzyme be in common, and a number of different sites can be incorporatedinto the same sequence. For instance, the following sequence5′-GAGTC▴NAC▴C▴D▴-3′          3   4 1 2has a nicking site for N.BstNBI (restriction site GAGTCNNNNˆ) at thearrow “1”, a cleavage site for the blunt cutter MlyI (restriction siteGAGTCNNNNNˆ) at arrow “2”, a cleavage site for the blunt cutter Hpy8I(restriction site GTNˆNAC) at arrow “3”, and a nicking site at arrow “4”for N.CviPII (restriction site CˆCD). Thus, a variety of restrictionsites can be designed into the hairpin or anchor.

The hairpin can also be designed to have an overhang, that is, one“strand” can be longer than the other. This increases the number ofpossible restriction sites that can be designed into the hairpin. Forinstance, the hairpin:

CTCAGNACCGGT-5′

GAGTCNTGG-3′

can have a nucleic acid template added to its 5′ end:

CTCAGNACCGGTNNNN . . . -5′

GAGTCNTGG              -3′.

Synthesis of the complementary strand will produce the followingdouble-stranded nucleic acid:             2  3

CTCAGNACC G▾GT▾NNNN . . . -5′

GAGTCNTGG▴C▴CA▴NNNN . . . -3′           1 2  3which can be nicked at position 1 by N.BstNBI, and is cleavable acrossboth strands at position 2 by MlyI, and at position 3 by BalI, anotherblunt cutter with restriction site TGGˆCCA. The single stranded templatecan be removed by use of N.BstNBI, or the original hairpin can berecovered by using BalI, followed by N.BstNBI to recover the overhang.Alternatively, a new type of blunt hairpin can be made by incorporating“CCA” onto the 3′ end of the hairpin to make it completelydouble-stranded.

Such overhangs can also be added to blunt hairpins by adding theoverhang in the same way one would add a single-stranded nucleic acidtemplate. This can be used to engineer a variety of restriction sitesinto the new hairpin. The actual template can then be added to the newoverhang.

All of the hairpins and methods for designing such hairpins, asdiscussed above, can also be synthesized in the form of double-strandednucleic acid “anchors”, to be attached to a solid substrate, and toserve as an intermediate molecule anchoring the template to the solidsubstrate.

All of the sequences described above have had restriction sites designedinto the 5′ to 3′ strand of the hairpin/anchor, with the 5′ end of therestriction site being closest to the substrate or anchoring point.Alternatively, however, this can be reversed. If one wished to use anenzyme that operates in the 3′ to 5′ direction, the sites can bedesigned into the other “strand” of the hairpin or the other strand ofthe anchor.

The sites to be designed into the hairpins and anchors can be chosen fora variety of reasons, including an enzyme's specificity ornon-specificity, ease of use, longevity, etc.

Alternatively, one can use enzymes that cleave beyond the 5′ end oftheir recognition sites. Enzymes for use in this way can be thosediscovered in nature (i.e., naturally-occurring enzymes), or can becreated by mutation of existing enzymes. Such enzymes include, e.g.,BcgI, BsaXI and BssKI. BssKI, for example, cleaves as follows: 5′ . .. {circumflex over ( )}CCNGG . . . 3′ 3′ . . . GGNCC{circumflex over( )} . . . 5′A mutant of BssKI (or another enzyme) can be made which cleaves in onlyone strand. This site can be included in a hairpin or anchor asdescribed herein, where the hairpin or anchor has non-cleavablephosphorothioate bonds on the 5′ half of the hairpin, so that cleavageonly occurs in the 3′ half of the hairpin, thereby creating a nick.

In another embodiment, the hairpin nucleic acid or double-strandednucleic acid anchor can be designed so that the portion to which thetemplate nucleic acid is attached contains non-cleavable bonds. That is,in the portion of the hairpin/anchor to which the template nucleic acidis attached, the nucleotides are attached to each other by bonds whichare not cleavable by an endonuclease. In such a hairpin/anchor, anordinary restriction endonuclease can be used, but it will behave as anicking endonuclease, and will cleave only one strand—the one with thecleavable bonds between the nucleotides.

The non-cleavable bonds can be phosphorothioate bonds, which are easilyadded during the synthesis of the hairpin/anchor. Any modification ofthe phosphodiester backbone of the hairpin/anchor can be used, where themodification allows binding of the restriction endonuclease to thehairpin/anchor, but prevents cleavage of the strand containing themodifications.

For instance, AatII normally cleaves the following sequence: 5′. . .G-A-C-G-T{circumflex over ( )}C . . . 3′ 3′. . . C{circumflex over( )}T-G-C-A-G . . . 5′

However, if the normal bonds (“-”) between the nucleotides at one of thecleavage cites were replaced with bonds that are not cleavable (“=”) byAatII, then the cleavage pattern would resemble that of a nickingendonuclease: 5′ . . . G-A-C-G-T = C . . . 3′ 3′ . . . C{circumflex over( )}T-G-C-A-G . . . 5′

The use of endonucleases facilitates simple cleaving of the DNA at anexact position in natural DNA bases. Therefore, no additional costs areincurred in constructing the hairpin/anchor sequences. Furthermore, theuse of an endonuclease guarantees that DNA cleavage produces terminithat are substrates for further manipulation by other enzymes such asligases or polymerases.

Regeneration of single-stranded DNA templates on a sequencing chip ornucleic acid array produces a spatially addressable array where thesequence of DNA at every position on the array is known. Such an arraycan be treated with a polymerase enzyme and natural dNTPs to produce adouble-stranded array that is also spatially addressable enabling thesystematic analysis of DNA-protein interactions.

The density of the single molecule arrays is not critical. However, thepresent invention can make use of a high density of hairpins/anchors,and these are preferable. For example, arrays with a density of 10⁶-10⁹hairpins/anchors per cm² may be used. Preferably, the density is atleast 10⁷/cm² and typically up to 10⁹/cm². These single molecule arraysare in contrast to other arrays which may be described in the art as“high density” but which are not necessarily as high and/or which do notallow single molecule resolution.

Using the methods and devices of the present invention, it may bepossible to image at least 10⁶-10⁹, preferably 10⁷ or 10⁸ hairpins oranchors per cm². Fast sequential imaging may be achieved using ascanning apparatus; shifting and transfer between images may allowhigher numbers of hairpins/anchors to be imaged.

The extent of separation between the individual hairpins/anchors on thearray will be determined, in part, by the particular technique used toresolve the individual hairpins/anchors. Apparatus used to imagemolecular arrays are known to those skilled in the art. For example, aconfocal scanning microscope may be used to scan the surface of thearray with a laser to image directly a fluorophore incorporated on theindividual hairpins/anchors by fluorescence. Alternatively, a sensitive2-D detector, such as a charge-coupled device, can be used to provide a2-D image representing the individual hairpins/anchors on the array.

“Resolving” single hairpins/anchors (and their attached templates andcomplements) on the array with a 2-D detector can be done if, at 100×magnification, adjacent hairpins/anchors are separated by a distance ofapproximately at least 250 nm, preferably at least 300 nm and morepreferably at least 350 nm. It will be appreciated that these distancesare dependent on magnification, and that other values can be determinedaccordingly, by one of ordinary skill in the art.

Other techniques such as scanning near-field optical microscopy (SNOM)are available which are capable of greater optical resolution, therebypermitting more dense arrays to be used. For example, using SNOM,adjacent hairpins/anchors may be separated by a distance of less than100 nm, e.g., 10 nm. For a description of scanning near-field opticalmicroscopy, see Moyer et al., Laser Focus World (1993) 29(10).

An additional technique that may be used is surface-specific totalinternal reflection fluorescence microscopy (TIRFM); see, for example,Vale et al., Nature (1996) 380:451-453). Using this technique, it ispossible to achieve wide-field imaging (up to 100 μm×100 μm) with singlemolecule sensitivity. This may allow arrays of greater than 10⁷resolvable hairpins/anchors per cm² to be used.

Additionally, the techniques of scanning tunnelling microscopy (Binniget al., Helvetica Physica Acta (1982) 55:726-735) and atomic forcemicroscopy (Hansma et al., Ann. Rev. Biophys. Biomol. Struct. (1994)23:115-139) are suitable for imaging the arrays of the presentinvention. Other devices which do not rely on microscopy may also beused, provided that they are capable of imaging within discrete areas ona solid support.

Immobilisation to the support may be by specific covalent ornon-covalent interactions. Covalent attachment is preferred. Theimmobilized hairpin/anchor is then able to undergo interactions withother molecules or cognates at positions distant from the solid support.Immobilisation in this manner results in well separatedhairpins/anchors. The advantage of this is that it prevents interactionbetween neighbouring hairpins/anchors on the array, which may hinderinterrogation of the array.

An array containing sequenced and regenerated templates can be used asan addressable platform for spatially organizing libraries of compoundsattached to single stranded DNA tags. For example, a combinatoriallibrary of drug compounds could be prepared with unique single strandedDNA tags or DNA mimics, e.g., PNA, and then added to asequenced/regenerated array. This would generate a spatially addressablearray of drug compounds on a chip. The same can be done for a proteinlibrary. Such chips could then be interrogated with probes to generateinformation about molecular interactions.

The arrays described herein are effectively single analyzable templatenucleic acids. This has many important benefits for the study of thetemplate sequences and their interaction with other biologicalmolecules. In particular, fluorescence events occurring on each templatenucleic acid can be detected using an optical microscope linked to asensitive detector, resulting in a distinct signal for each template.

When used in a multi-step analysis of a population of single templates,the phasing problems (loss of synchronisation) that are encounteredusing high density (multi-molecule) arrays of the prior art, can bereduced or removed. Therefore, the arrays also permit a massivelyparallel approach to monitoring fluorescent or other events on thetemplates. Such massively parallel data acquisition makes the arraysextremely useful in a wide range of analysis procedures which involvethe screening/characterising of heterogeneous mixtures of templates.

EXAMPLE 1 Regeneration of Hairpin

Twenty microliters of solution is prepared containing 50 pmoles of a DNAhairpin phosphorylated at its 5′ end, 10 pmoles of a non-phosphorylatedDNA double-stranded oligonucleotide, and several thousand units of a DNAligase enzyme. The oligonucleotide is designed such that one strand isshorter than the other, making the oligonucleotide blunt-ended at oneend and single stranded at the other, a 5′ end. The single-stranded endcarries a fluorescent label. The action of the ligase enzyme fuses thehairpin and the double-stranded oligonucleotide at their blunt endsonly, and because only the 5′ end of the hairpin carries a phosphategroup, the reaction results in joining one stand to the hairpin—thelonger strand that carries the fluorescent group.

The template is regenerated by taking a solution containing 2.5 pmolesof a fluorescently labeled strand of DNA that has been previouslyligated to a blunt DNA hairpin. The single-stranded portion of this DNAconstruct, ie., the template strand, can be made double-stranded byemploying 1 Unit of Vent exo⁻ polymerase (New England Biolabs, Inc.,Beverly, Mass., USA) to incorporate a mixture of four oligonucleotides,each at a concentration of 25 pmoles per reaction, at 75° C. for 30minutes. Upon completion, the reaction mixture is purified using a DNApurification kit (Qiagen, Hilden, Germany) and split in two. Half iskept for analysis and half (1.25 pmoles) is digested at 55° C. for 30minutes with N.BstNBI (5 Units; New England Biolabs, Inc., Beverly,Mass., USA), which nicks the extended DNA construct proximal to the newsynthetic stand. The formation of the synthetic complementary strand bythe polymerase enzyme and its removal by digestion with the nickingenzyme can be analyzed by polyacrylamide gel electrophoresis, whichdistinguishes the DNA products by virtue of their differences in size.The presence of the fluorescent group ensures that the DNA molecules canbe easily detected.

This procedure can also be performed with little modification in aflow-cell where the substrate comprises DNA ligated to DNA hairpins thatare covalently attached to the glass surface of the flow cell. In thiscase, the attachment of the DNA to a solid support, the glass, obviatesthe need to employ a DNA purification kit between enzyme steps: instead,products can be removed and new reagents added by flowing solutionsacross through the cell.

EXAMPLE 2 Bisulfite Reaction

In general, the DNA is rendered single-stranded by taking a 20 μlsolution of 2-10 μg of genomic DNA fragments and adding 0.3M NaOH andincubating at room temperature for 15 minutes. 150 μl of 0.6 Mhydroquinone containing 3.5 M sodium bisulfite (pH 5) is then added, andthe mixture incubated for 10 hours at 50° C. The reaction is thenpurified using a DNA purification kit (Qiagen, Hilden, Germany).

When performing the bisulfite reaction on DNA on an array, priordenaturation of the DNA is not required. The DNA will be single strandedand attached to a hairpin nucleic acid or a double-stranded nucleic acidanchor on a surface. The DNA will have been rendered single-strandedafter a sequencing reaction by the action of a nicking endonuclease thatcleaves the sequencing strand away from the immobilised template strand.Thus, a 150 μl solution of 0.6 M hydroquinone containing 3.5 M sodiumbisulfite (pH 5) is injected onto the array, and the array is thenincubated at 50° C. for 5 hours. The array is then washed with water,then 150 μl of 200 mM NaOH added and incubated for 20 minutes. The arrayis next washed with 1 ml of 200 mM HCl, then finally washed with 5 ml ofwater. The array is then ready for a second round of sequencing todetermine the methylation status of the DNA on the array.

All patents, patent applications, and published references cited hereinare hereby incorporated by reference in their entirety. While thisinvention has been particularly shown and described with references topreferred embodiments thereof, it will be understood by those skilled inthe art that various changes in form and details may be made thereinwithout departing from the scope of the invention encompassed by theappended claims.

1. A method for detecting a methylated cytosine in a template nucleicacid, the method comprising: (a) providing a hairpin-template complex,comprising: (i) a hairpin nucleic acid, wherein the hairpin nucleic acidis self-complementary and has a first restriction site for a nickingendonuclease, said restriction site comprising a recognition sequenceand a cleavage site, wherein said recognition sequence is situated sothat said cleavage site is before, at, or beyond the 3′ end of thehairpin nucleic acid, and wherein said hairpin nucleic acid is aself-hybrid; and (ii) a single-stranded template nucleic acid; wherein5′ end of the hairpin nucleic acid is attached to the 3′ end of thesingle-stranded template nucleic acid; (b) sequencing thesingle-stranded template nucleic acid of the hairpin-template complex,thereby producing: (ii) a first sequence; and (i) ahairpin-template-complement complex, comprising the hairpin-templatecomplex of (a), and further comprising a synthetic nucleic acid strandcomplementary to the template nucleic acid, wherein the syntheticnucleic acid strand is hybridized to the template nucleic acid, andwherein the complementary nucleic acid strand is attached at its 5′ endto the 3′ end of the hairpin nucleic acid; (c) removing thecomplementary nucleic acid strand from the hairpin-template-complementcomplex, thereby recovering the hairpin-template complex; (d) treatingthe hairpin-template complex with sodium bisulfite, thereby producing asodium bisultite-treated template nucleic acid; (e) sequencing thesodium bisulfite-treated template nucleic acid of (c), thereby producinga second sequence; and (f) comparing the first sequence and the secondsequence, where the presence of a cytosine in the second sequenceindicates that the cytosine at that position is methylated; therebydetecting a methylated cytosine in the template nucleic acid.
 2. Themethod of claim 1, wherein the hairpin nucleic acid is attached to asolid substrate.
 3. An addressable array comprising a hairpin-templatecomplex, comprising: (a) a hairpin nucleic acid, wherein the hairpinnucleic acid is self-complementary and has a first restriction site fora nicking endonuclease, said restriction site comprising a recognitionsequence and a cleavage site, wherein said recognition sequence issituated so that said cleavage site is before, at, or beyond the 3′ endof the hairpin nucleic acid, and wherein said hairpin nucleic acid is aself-hybrid, and wherein the hairpin nucleic acid is attached to a solidsubstrate; and (b) a single-stranded template nucleic acid, wherein the5′ end of the hairpin nucleic acid is attached to the 3′ end of thesingle-stranded template nucleic acid.
 4. An addressable array,comprising a plurality of the hairpin-template complexes of claim 3,wherein adjacent complexes are separated by a distance of at least 10nm.
 5. The addressable array of claim 4, wherein the complexes areseparated by a distance of at least 100 nm.
 6. The addressable array ofclaim 4, wherein the complexes are separated by a distance of at least250 nm.
 7. The addressable array of claim 4, wherein the density of thecomplexes is from 10⁶ to 10⁹ polynucleotides per cm².
 8. The addressablearray of claim 4, wherein the density of the complexes is from 10⁷ to10⁸ molecules per cm².
 9. A kit, comprising the addressable array ofclaim
 3. 10. A method for detecting a methylated cytosine in a templatenucleic acid, the method comprising: (a) providing an anchor-templatecomplex, comprising: (i) a double-stranded nucleic acid anchor, whereinthe double-stranded nucleic acid anchor comprises: (A) a first end and asecond end; and (B) a first restriction site for a nicking endonuclease,said restriction site comprising a recognition sequence and a cleavagesite, wherein said cleavage site is situated so that said cleavage siteis before, at, or beyond the 3′ end of the first end of thedouble-stranded nucleic acid anchor; and (ii) a single-stranded templatenucleic acid; wherein the 5′ end of the first end of the double-strandednucleic acid anchor is attached to the 3′ end of the single-strandedtemplate nucleic acid; (b) sequencing the single-stranded templatenucleic acid of the anchor-template complex, thereby producing: (i) afirst sequence; and (ii) an anchor-template-complement complex,comprising the anchor-template complex of (a), and further comprising asynthetic nucleic acid strand complementary to the template nucleicacid, wherein the synthetic nucleic acid strand is hybridized to thetemplate nucleic acid, and wherein the complementary nucleic acid strandis attached at its 5′ end to the 3′ end of the first end of thedouble-stranded nucleic acid anchor; (c) removing the complementarynucleic acid strand from the anchor-template-complement complex, therebyrecovering the anchor-template complex; (d) treating the anchor-templatecomplex with sodium bisulfite, thereby producing a sodiumbisulfite-treated anchor-template complex; (e) sequencing the sodiumbisulfite-treated anchor-template complex of (d), thereby producing asecond sequence; and (f) comparing the first sequence and the secondsequence, where the presence of a cytosine in the second sequenceindicates that the cytosine at that position in the template nucleicacid is methylated; thereby detecting a methylated cytosine in thetemplate nucleic acid.
 11. The method of claim 10, wherein thedouble-stranded nucleic acid anchor is attached at its second end to asolid substrate.
 12. An addressable array comprising an anchor-templatecomplex, comprising: (a) a double-stranded nucleic acid anchor, whereinthe double-stranded nucleic acid anchor comprises: (i) a first end and asecond end; and (ii) a first restriction site for a nickingendonuclease, said restriction site comprising a recognition sequenceand a cleavage site, wherein said cleavage site is situated so that saidcleavage site is before, at, or beyond the 3′ end of the first end ofthe double-stranded nucleic acid anchor; and (b) a single-strandedtemplate nucleic acid; wherein the 5′ end of the first end of thedouble-stranded nucleic acid anchor is attached to the 3′ end of thesingle-stranded template nucleic acid.
 13. An addressable array,comprising a plurality of the anchor-template complexes of claim 12,wherein adjacent complexes are separated by a distance of at least 10nm.
 14. The addressable array of claim 12, wherein the complexes areseparated by a distance of at least 100 nm.
 15. The addressable array ofclaim 12, wherein the complexes are separated by a distance of at least250 nm.
 16. The addressable array of claim 12, wherein the density ofthe complexes is from 10⁶ to 10⁹ polynucleotides per cm².
 17. Theaddressable array of claim 12, wherein the density of the complexes isfrom 10⁷ to 10⁸ molecules per cm².
 18. A kit, comprising the addressablearray of claim
 12. 19. A method for detecting a methylated cytosine in atemplate nucleic acid of known sequence, the method comprising: (a)providing a hairpin-template complex, comprising: (i) a hairpin nucleicacid, wherein the hairpin nucleic acid is self-complementary and has afirst restriction site for a nicking endonuclease, said restriction sitecomprising a recognition sequence and a cleavage site, wherein saidrecognition sequence is situated so that said cleavage site is before,at, or beyond the 3′ end of the hairpin nucleic acid, and wherein saidhairpin nucleic acid is a self-hybrid; and (ii) a single-strandedtemplate nucleic acid; wherein 5′ end of the hairpin nucleic acid isattached to the 3′ end of the single-stranded template nucleic acid; (b)treating the hairpin-template complex with sodium bisulfite, therebyproducing a sodium bisulfite-treated template nucleic acid; (c)sequencing the sodium bisulfite-treated template nucleic acid of (b),thereby producing a sequence; and (d) comparing the sequence of (c) andthe known sequence, where the presence of a cytosine in the sequence of(c) indicates that the cytosine at that position is methylated; therebydetecting a methylated cytosine in the template nucleic acid of knownsequence.
 20. The method of claim 19, wherein the hairpin nucleic acidis attached to a solid substrate.
 21. A method for detecting amethylated cytosine in a template nucleic acid of known sequence, themethod comprising: (a) providing an anchor-template complex, comprising:(i) a double-stranded nucleic acid anchor, wherein the double-strandednucleic acid anchor comprises: (A) a first end and a second end; and (B)a first restriction site for a nicking endonuclease, said restrictionsite comprising a recognition sequence and a cleavage site, wherein saidcleavage site is situated so that said cleavage site is before, at, orbeyond the 3′ end of the first end of the double-stranded nucleic acidanchor, and (ii) a single-stranded template nucleic acid; wherein the 5′end of the first end of the double-stranded nucleic acid anchor isattached to the 3′ end of the single-stranded template nucleic acid; (b)treating the anchor-template complex with sodium bisulfite, therebyproducing a sodium bisulfite-treated anchor-template complex; (c)sequencing the sodium bisulfite-treated anchor-template complex of (b),thereby producing a sequence; and (d) comparing the sequence of (c) andthe known sequence, where the presence of a cytosine in the sequence of(c) indicates that the cytosine at that position in the template nucleicacid is methylated; thereby detecting a methylated cytosine in thetemplate nucleic acid.
 22. The method of claim 21, wherein thedouble-stranded nucleic acid anchor is attached at its second end to asolid substrate.
 23. A method for detecting a methylated cytosine in atemplate nucleic acid of known sequence, wherein one or more of thecytosines in the template nucleic acid have been converted to uracil,the method comprising: (a) providing a hairpin-template complex,comprising: (i) a hairpin nucleic acid, wherein the hairpin nucleic acidis self-complementary and has a first restriction site for a nickingendonuclease, said restriction site comprising a recognition sequenceand a cleavage site, wherein said recognition sequence is situated sothat said cleavage site is before, at, or beyond the 3′ end of thehairpin nucleic acid, and wherein said hairpin nucleic acid is aself-hybrid; and (ii) a single-stranded template nucleic acid; wherein5′ end of the hairpin nucleic acid is attached to the 3′ end of thesingle-stranded template nucleic acid; (b) sequencing the templatenucleic acid, thereby producing a sequence; and c) comparing thesequence of (b) and the known sequence, where the presence of a cytosinein the sequence of (b) indicates that the cytosine at that position ismethylated; thereby detecting a methylated cytosine in the templatenucleic acid of known sequence.
 24. The method of claim 23, wherein thehairpin nucleic acid is attached to a solid substrate.
 25. A method fordetecting a methylated cytosine in a template nucleic acid of knownsequence, wherein one or more of the cytosines in the template nucleicacid have been converted to uracil, the method comprising: (a) providingan anchor-template complex, comprising: (i) a double-stranded nucleicacid anchor, wherein the double-stranded nucleic acid anchor comprises:(A) a first end and a second end; and (B) a first restriction site for anicking endonuclease, said restriction site comprising a recognitionsequence and a cleavage site, wherein said cleavage site is situated sothat said cleavage site is before, at, or beyond the 3′ end of the firstend of the double-stranded nucleic acid anchor; and (ii) asingle-stranded template nucleic acid; wherein the 5′ end of the firstend of the double-stranded nucleic acid anchor is attached to the 3′ endof the single-stranded template nucleic acid; (b) sequencing theanchor-template complex, thereby producing a sequence; and (c) comparingthe sequence of (b) and the known sequence, where the presence of acytosine in the sequence of (b) indicates that the cytosine at thatposition in the template nucleic acid is methylated; thereby detecting amethylated cytosine in the template nucleic acid.
 26. The method ofclaim 25, wherein the double-stranded nucleic acid anchor is attached atits second end to a solid substrate.